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Abstract 

We study the problem of parameter estimation using maximum likelihood for 
fast/slow systems of stochastic differential equations. Our aim is to shed light on 
the problem of model/data mismatch at small scales. We consider two classes of 
fast/slow problems for which a closed coarse-grained equation for the slow vari- 
ables can be rigorously derived, which we refer to as averaging and homogeniza- 
tion problems. We ask whether, given data from the slow variable in the fast/slow 
system, we can correctly estimate parameters in the drift of the coarse-grained 
equation for the slow variable, using maximum likelihood. We show that, whereas 
the maximum likelihood estimator is asymptotically unbiased for the averaging 
problem, for the homogenization problem maximum likelihood fails unless we 
subsample the data at an appropriate rate. An explicit formula for the asymptotic 
error in the log likelihood function is presented. Our theory is applied to two sim- 
ple examples from molecular dynamics. 
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1 Introduction 

Fitting stochastic differential equations (SDEs) to time-series data is often a useful way 
of extracting simple model fits which capture important aspects of the dynamics |9|. 
However, whilst the data may well be compatible with an SDE model in many respects, 
it is often incompatible with the desired model at small scales. Since many commonly 
applied statistical techniques see the data at small scales this can lead to inconsistencies 
between the data and the desired model fit. This phenomenon appears quite often in 
econometrics |ffl|2l [T3"l . where the term market microstructure noise is used to describe 
the high frequency /small scale part of the data as well as in molecular dynamics |[T9l . 
In essence, the problem that we are facing is that there is an inconsistency between the 
coarse-grained model that we are using and the microscopic dynamics from which the 
data is generated, at small scales. Similar problems appear quite often in statistical in- 
ference, in the context of parameter estimation for misspecified or incorrect models ifTTl 
Sec. 2.6]. 

The aim of this paper is to create a theoretical framework in which it is possible 
to study this issue, in order to gain better insight into how it is manifest in practice, 
and how to overcome it. In particular our goal is to investigate the following problem: 
how can we fit data obtained from the high-dimensional, multiscale full dynamics to 
a low-dimensional, coarse grained model which governs the evolution of the resolved 
("slow") degrees of freedom? We will study this question for a class of stochastic 
systems for which we can derive rigorously a coarse grained description for the dy- 
namics of the resolved variables. More specifically, we will work in the framework of 
coupled systems of multiscale SDEs for a pair of unknown functions (x(t), y(t)). We 
assume that y(t) is fast, relative to x(t), and that the equations average or homoge- 
nize to give a closed equation for X(t) to which x(t) converges in the limit of infinite 
scale separation. The function X(t) then approximates x(t), typically in the sense of 
weak convergence of probability measures ||7][20). We then ask the following question: 
given data for x(t), from the coupled system, can we correctly identify parameters in 
the averaged or homogenized model for X(t)l 

Fast/slow systems of SDEs of this form have been studied extensively over the last 



been proposed for solving numerically these SDEs [6 8 23]. In these works, the coef- 
ficients of the limiting SDE are calculated "on the fly" from simulations of the fast/slow 
system. There is a direct link between these numerical methods and our approach in 
that our goal is also to infer information about the coefficients in the coarse-grained 
equation using data from the multiscale system. However, our interest is mainly in sit- 
uations where the "microscopic" multiscale system is not known explicitly. From this 
point of view, we merely use the multiscale stochastic system as our "data generating 
process"; our goal is to fit this data to the coarse-grained equation for X(t), the limit 
of the slow variable x(t). 

A first step towards the understanding of this problem was taken in |fl9l . There, 
the data generating process x(t) was taken to be the path of a particle moving in a 
multiscale potential under the influence of thermal noise. The goal was to identify pa- 
rameters in the drift as well as the diffusion coefficient in the homogenized model for 
X(t), the weak limit of x(t). It was shown that the maximum likelihood estimator is 
asymptotically biased and that subsampling is necessary in order to estimate the param- 
eters of the homogenized limit correctly, based on a time series (i.e. single observation) 
ofx(t). 

In this paper we extend the analysis to more general classes of fast/slow systems of 




Recently, various methods have 
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SDEs for which either an averaging or homogenization principle holds l20]| . We con- 
sider cases where the drift in the averaged or homogenized equation contains parame- 
ters which we want to estimate using observations of the slow variable in the fast/slow 
system. We show that in the case of averaging the maximum likelihood function is 
asymptotically unbiased and that we can estimate correctly the parameters of the drift 
in the averaged model from a single path of the slow variable x(t). On the other hand, 
we show rigorously that the maximum likelihood estimator is asymptotically biased for 
homogenization problems. In particular, an additional term appears in the likelihood 
function in the limit of infinite scale separation. We show then that this term vanishes, 
and hence that the maximum likelihood estimator becomes asymptotically unbiased, 
provided that we subsample at an appropriate rate. 

To be more specific, in this paper we will consider fast/slow systems of SDEs of 
the form 

dx t , ^ / ^ dU , ^ dV / 1 1 \ 

= h{x,y)+a {x,y)— + a l (x,y) — , (1.1a) 

-| = l go (x,y) + ^=(3{x,y) — ; (Lib) 
at e y/e at 

or the SDEs 

dx 1 , . dU , .dV _ N 

-r: = -M%,y) + h{x,y) + a (x,y) — +ai(x,y) — , (1.2a) 

% = \go(x,y) + -gi(x,y) + -0(x,y)^ r . (1.2b) 
at e z e e at 

We will refer to equations (11. Il l as the averaging problem and to equations (11.2b as 
the homogenization problem. In both cases our assumptions on the coefficients in the 
SDEs are such that a coarse-grained (averaged or homogenized) equation exists, which 
is of the form 

d -§ = F(X;0) + K(X) d ^. (1.3) 

Th e slo w variable x(t) converges weakly, in the limit as e — > 0, to X(t), the solution 
of fll.31 >. We assume that the vector field F(X; 9) depends on a set of parameters 9 that 
we want to estimate based on data from either the averaging or the homogenization 
problem. We suppose that the actual drift compatible with the data is given by F{X) = 
F(X; 9o)- We ask whether it is possible to correctly identify 9 = 9q by findin g th e 
maximum likelihoo d esti ma tor ( MLE) when using a statistical model of the form ( 11. 3b , 
but given data from ( II . lb or ( 11.2b . Our main results can be stated, informally, as follows. 

Theorem 1.1. Assume that we are giv en co ntin uous time data. The MLE for the av- 
eraging problem (i.e. fitting data from (11. lab to (11.3b ) is asymptotically unbiase d. On 
the othe r hand, the MLE for the homogenization problem (i.e. fitting data from ( 11.2ab 
to (11.3b ) is asymptotically biased and an explicit formula for the asymptotic error in 
the likelihood, E^, can be obtained. 

Precise statements of the above results can be found in Theorems 13.101 13.121 and 
EH 

The failure of the MLE when applied to the homogenization problem is due to the 
presence of high freq uenc y data. Naturally, in or der to be able to identify correctly the 
parameter 9 — 9q in ( 11.3b using data from (11.2ab subsampling at an appropriate rate is 
necessary. 

Theorem 1.2. The MLE for the homogenization problem becomes asymptotically un- 
biased if we subsample at an appropriate rate. 



3 



Roughly speaking, the sampling rate should be between the two characteristic time 
scales of the fast/slow SD Es ( 11 .21 . 1 and e 2 . The precise statement of this result can 
be found in Theorems 14. l| an d|4. 51 Iln pract ice real data will not come explicitly from 
a scale-separated model like ( 1 1 . 1 at or ( II .2ab - However real data is often multiscale in 
character. Thus the results in this paper shed light on the pitfalls that may arise when 
fitting simplified statistical models to multiscale data. Furthermore the results indicate 
the central, and subtle, role played by subsampling data in order to overcome mismatch 
between model and data at small scales. 

The rest of the paper is organized as follows. In Section |2] we study the fast/slow 
stochastic systems introduced above, and prove appropriate averaging and homoge- 
nizat ion theorems. In Section [3] we introduce the maximum likelihood function for 
(11.3b and study its lim iting beha vior, given data from the averaging and homogeniza- 
tion problems ( 1 1 . 1 al l and fll.2a| ). In Section [4] we show that, when subsampling at 
an appropriate rate, the maximum likelihood estimator for the homogenization prob- 
lem becomes asymptotically unbiased. In Section [5] we present examples of fast/slow 
stochastic systems that fit into the general framework of this paper. Section [6] is re- 
served for conclusions. Various technical results are proved in the appendices. 



2 Set-Up 

We will consider fast/slow systems of SDEs for the variables (x, y) € X x y. We can 
take, for example, X x y = R ( x M. d ~ l or X x y = T l x T d - 1 . In the second case, 
where the state space is compact, all of the assumptions that we need for the proofs of 
our results can be justified using elliptic PDEs theory. 

Let <p^(y) denote the Markov process which solves the SDE 



Here £ 6 A" is a fixed parameter and, for each t > 0, (f^(y) £ y, go X x y ^ M. d , 

is a standard Brownian motion in rn dimensions^ The 

generator of the process is 

£o(0 = 9o(t V) ■ V y + ±B(t, y) : V y V y (2.2) 

with y) := /3(£, y)f3(£, y) T . Notice that £o(£) lS a differential operator in y alone, 
with £ a parameter. 

Our interest is in data generated by the projection onto the x coordinate of systems 
of SDEs for (x, y) in X x y. In particular, for U a standard Brownian motion in K™ 
we will consider either of the following coupled systems of SDEs: 

dx . , . dU . . dV 

= h{x,y)+a {x,y)— + a 1 {x,y) — , (2.3a) 

it = - £ 9o( X ,y) + ^P( X ,y)^ (2.3b) 



1 Throughout this paper we write stochastic differential equations as identities in fully differentiated form, 
even though Brownian motion is not differentiable. In all cases the identity should be interpeted as holding 
in integrated form, with the Ito interpreation of the stochastic integral. 
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or the SDEs 



dx 1 . N . . . d£/ , dV , . 

-77 = -fo{x,y) + h{x,y) +a {x,y)— + a\(x,y) — , (2.4a) 
at e at at 



dy 1 , v 1 , „ 1 n , % dl^ - , 

= _ ff0 ( x ,i/) + - ffl (x,|/) + - ) 8(aj ) i/) — . (2.4b) 
at e z e e at 



Here/, : A'xy M 1 , a : Xxy -> R lxn , a x : Xxy -> R lx "\ 9l :Xxy^R d - 1 
and go, (3 and V are as above. 

Assumptions 2.1. • The equation 



was a unique non-negative solution p(y; £) £ L l (y) for every £ G X; further- 
more p(y; £) is C°° in j/ and £ . 

For eacn £ G X define the weighted Hilbert space L 2 (y; £) with inner-product 
(a,b) p := / p(y,£)a(y)b(y)dy. 



For all £ G X Poisson equation 

-£o(£)eO/;0 = %;0, / p(i/;0e(»;0dw = o 

was a unique solution 0(y; £) G L 2 p (y\ £), provided that 

p{y;Z)h(y;Z)dy = 0. 



• The functions fi, gi, on, (3 and all derivatives are uniformly bounded in X x y. 

• // £) and all its derivatives with respect to y, £ are uniformly bounded in 
X x y then the same is true ofQ solving the Poisson equation above. 

Remark 2.2. In the case where the state space of the fast process is compact, y — 
Y d ~ l , and the diffusion matrix B(£, y) is positive definite the above assumptions can 
be easily proved using elliptic PDE theory H20\ Ch. 6]. Similar re sults can a lso be 
proved without the compactness and uniform ellipticity assumptions h!5\ 1761 1771/ . 

The first assumption essentially states the the process ( 12.lt is ergodic, for each 
£ G X . Let Cq = Cq(x) and define 

Ci = fa ■ V x + .gi • V y + C : V y V x , 

Ci = fx ■ V x + -A : V a V x , 

where 

A(x,y) = a (x,y)a (x,y) T + a 1 (x,y)a 1 (x,y) T , 
C(x,y) = a 1 (x,y)f3(x,y) T . 
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The generators for the Markov processes defined by equations ( 12.3b and d2.4l) respec- 
tively are 



C av = -C + 1=C X + £2, (2.5) 

e v/e 

4om = ^r^o H — A +£2, (2.6) 
e 

with the understanding that /o = and gi = in the case of C av . We let fl denote the 
probab ility space fo r the pair of Brownian motions U, V. 

In ( 12.31 ) (resp. ( 12.41 )) the dynamics for y with x viewed as frozen has solution 

Lp t J t {y{Q)) (resp. (pH* (y(0))). Of course a; is not frozen, but since it evolves mu ch 
more slowly than y, intuition based on freezing x and considering the process ( 12.11) is 
usef ul in understanding how averaging and homogenization arise for equations ( 12.31 ) 
and (12.41 ) respectively. Specifically, for ( 12.3b on timescales long compared with e and 
short compared to 1, x will be approximately frozen and y will traverse its invariant 
measure with density p(y; x). We ma y thus average over this measure and eliminate y. 
Similar ideas hold for equation ( 12.41 ). but are complicated by the presence of the term 
e /o< These ideas underly the averaging and homogenization results contained in the 
next two subsections. 



2.1 Averaging 

Define F : X -> R l and K : X -> R lxl by 

F (x) ■= / fi(x,y)p(y;x)dy 
Jy 

and 

^(nOi^a;) 7, := / (a (x, y)a (x, y) T + ai(x, y)ai (x, y) T )p(y, x)dy. 

Jy 

Note that K(x)K(x) T is positive semidefinite and hence K (x) is well defined via, for 
example, the Cholesky decomposition. 

Theorem 2.3. Let Assumptions \2.1\ hold and let x(0) — ^(0). Then x X in 
C([0, T] , X) and X solves the SDE 

where W is ca standard l-dimensional Brownian motion. 

We use the notation S! to denote the probability space for the Brownian motion 

W. 

Proof. Consider the Poisson equation 

-C E(y;x) = fi(x,y) - F (x), / p(y; x)E(y; x)dy = 

Jy 
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with unique solution E(y; x) <E L 2 p (y\ x). Applying Ito's formula to 2 we obtain 

— = -£ 2 + -^AH + £ 2 3 + -=V y E{3— + V x Ea a — + V y E ai — . 
at e y/e ^/e at at at 

From this we obtain 

J (fi(x(s), y(s)) - F(x(s))) ds = e Q (t) 

where 

e (t) = v / e/ (C 1 Eds + V y Ef3dV)+e [ (£ 2 Zds + V x Ea dU + V y E ai dV) 
Jo Jo 

+e(E(y(0);x(0))-E(y(t);x(t))). 
Thus, by Assumptions 12 . 1 1 and the Burkholder-Davis-Gundy inequality, 

e -> in L p ((7([0,T],<Y);f2). 

Hence 

x(t)=x{0)+ / F(je(s))cLs + M(f) +e (i) 



with 

M(t) := / ao(x( s ),y(s))dU(s)+ [ a 1 (x(s),y{s))dV(s). 
Jo Jo 

The quadratic variation process for M(t) is 

(M) t = / A(;r( S ),y( S ))d S , 
Jo 

where 

= a (x, y)a (x,y) T + a 1 (x,y)a 1 (x,y) T . 

By use of the Poisson equation technique applied above to show that fo(x, y) can be 
approximated by F(x) (its average against the fast y process), we can show similarly 
that 

r t r-t 

A{x(s),y(s))ds = / K{x(s))K(x(s)) T ds + ei (t) 
o Jo 

where, as above, 

d -» in L p (C([0,T],Af);ft). 

Let 

B(t)=x(0)+ [ F(x(s))ds + e {t), 
Jo 

q(t)= [ K{x{s))K{x{s)) T ds + e 1 {t). 
Jo 

Then 

x(t) = B{t)+M(t), 
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where M(t) and M(t)M(t) T —q(t) are Tt martingales, where Tt is the filtration gener- 
ated by a((U(s), V(s)), s < t). Let C£°(X) denote the space of compactly supported 
C°° functions. The martingale problem for 

A = {(f,K:F-Vf + V.VJ) : / e Cf{X)} 

is well posed and x(s), y(s) and X(s) are continuous. By L 2 convergence of the to 
in C([0, T], X) we deduce convergence to in probability, in the same space. Hence 
by a slight generalization of Theorem 4. 1 in Chapter 7 of Q we deduce the desired 
result. □ 

2.2 Homogenization 

In order for the equations ( 12.41 i to produce a sensible limit as e — > it is necessary to 
impose a condition on /q. Specifically we assume the following which, roughly, says 
that fo(x, y) averages to zero against the invariant measure of the fast y process, with 
x fixed. 

Assumptions 2.4. The function /q satisfies the centering condition 

I P(y,x)f {x 1 y)dy = 0. 

Jy 

Let $(y; x) £ L 2 (y; x) be the solution of the equation 

-£ $(y;x) = fo(x,y), / p(y; x)$(y; x)dy = 0, (2.8) 

•ly 

which is unique by Assumptions l2.4l Define 

F Q (x) := I [Ci$)(x,y)p(y;x)dy 
v 

[(V x $f )(x,y) + (V v *gi)(x,y) + ("i/? T : V y V a $) (x, y))p{y; x)dy, 
Fi{x) := I fi{x,y)p(y;x)dy and 

Jy 

F(x) = F (x) + F ± (x). 



y 



Also define 

A l (x)A 1 (x) T := (jy v $p + ai) (V„$/3 + c*i) T ) {x, y)p(y; x)dy, 
Aq{x)A (x) t := / a (x,y)a Q {x,y) T p{y;x)dy and 

Jy 

K(x)K{x) T = A (x)A (x) t + A 1 (x)A 1 (x) T . 

Note that K(x)K(x) T is positive semidefinite by construction so that K(x) is well 
defined by, for example, the Cholesky decomposition. 
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Theorem 2.5. Let Assumptions |Z71 12.41 hold. Then x => X in C([0,T], X) and X 
solves the SDE 

dX ,dW 

-^ = F(X) + A(X)— (2.9) 
where W is a standard I- dimensional Brownian motion. 
Proof. We consider three Poisson equations: that for $ given above and 



-£ox(2/;0 = ft(x,y) - Ft(x), / p(y; x)x{y; x)dy = 0, (2.10a) 

Jy 

= (A*)0c,v) -*!>(*), / p(i/;x)*(»;a;)di/ = 0.(2.10b) 

All of these equations have a unique solution since the right hand sides average to zero 
against the density p(y; x) by assumption ($) or by construction (x, 
By the Ito formula we obtain 

— = -^A$ + ~A$ + +A$ + -V a $/3— + V^^ao— + V x $ai — . 
at e z e e at at at 



From this we obtain, using arguments similar to those in the proof of Theorem l2.31 

h{x,y)ds= f (C 1 ^)(x(s),y(s))ds + f {V v *0){x(s), y(s))dV(s) + e (t) 



1 '* 



o 



c 

where 



eo(t)->0 in L"(C{[0,T\,X);n) 



and where, rec all, il i s the probability space for (U, V). Applying Ito's formula to x, 
the solution of ( 12. lOat . we may show that 

f (M*(8),V{*)) ~FMs)))ds = e x (t) 

where 

ei(t)-»0 in L*'(C([0,r],R ,, );n). 

Thus 

x(t)=x(0)+ [\c 1 $)(x(s),y(s))ds+ f Fi(x{a))d8 + f \v y $f3){x(s),y(s))dV{s) 



a {x(s),y(s))dU(s)+ / a 1 (x(s),y(s))dV(s) + e 2 (t) 



and 



e 2 (t) -v in L p (C([0,T], 
By applying Ito's formula to the solution of (12 . 1 Obb we obtain 

— = -£ * + -A* + +A* + -Vy^P— + V,*a — + V x *ai — 
at e z e e at dt at 
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From this we obtain 



J (£ 1 $-F o yx,y)ds = e 3 (t) 



where 

e 3 (t)^0 in LP(C([0,T},X);n). 

Thus 

x(t) =i(0) + / F(x(s))ds + M(t) + e 4 (t) and 
Jo 

M(t) := / a (x(s),y(s))dU(s) + (V y $l3 + a 1 )(x(s),y(s))dV(s). 
Jo 

Here 

e 4 -^0 in L p (C([0,T],X);Q.). 

Define 

A 2 (a;, y) = (V y $p + a x ) (V a $/3 + «i) T (x, y) + a (x, y)a (x, y) T . 
The quadratic variation of M(t) is 

(M)< = / A 2 (x(s),y(s))ds. 
Jo 

By use of the Poisson equation technique we can show that 

A 2 (x( S ) 7 y{s))ds = / K{x{s))K{x{s)) T ds + e 5 {t) 
o Jo 

where, as above, 

e 5 -^0 in L p (C([0,T], X);Q). 



The remainder of the proof proceeds as in Theorem |2.3| □ 



3 Parameter Estimation 

Recall that fig is the probability space for W. Imagine that we try to fit data {x(t)} te ^ T ] 
from ( 12.3b or ( I2.4I > to a homogenized or averaged equation of the from d2.7l ) or ( 12.91 1, 
but with unknown parameter 6* G 9, where 9 is an open subset of R k , in the drift: 

Suppose that the actual drift compatible with the data is given by F(X) = F(X; 9q). 
We ask whether it is possible to correctly identify 9 = do by findin g the maximum 
likelihood estim ator (ML E) when using a statistical model of the form (I3~T1 >. but given 
data from ( 12. 3t or (12 .41 . Recall that the avera ging and h omo genization techniques 
from the previous section show that x(t ) fro m ( 12.31 ) and ( 12.4b converges weakly to 
the solution of an equation of t he fo rm (13. U . We make the following assumptions 
concerning the model equations ( 13. U which will be used to fit the data. 
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Assumptions 3.1 . We assume that K is uniformly positive-definite on X . We also 
assume that ( 13.lt is ergodic with invariant measure v(dx) = ir(x)dx at 9 — 9q and 
that 



Aoo := (Kix^Fix) <g) K{x)- l F(x)) ir{x)dx (3.2) 
Jx 

is invertible. 

Given data {z(i)}tg[o t], the l°g likelihood function for 9 satisfying ( I3.lt is given 

U6;z) = J (F(z;9),dz) a{z) -±J^ \F(z;8)\ 2 a{z) dt, (3.3) 

where 

(p,q) a(z) = (K(zy 1 p,K(z)- 1 q). 

To be precise 

^=ex P (L(^)) 

where P is t he p ath space measure for ( 13.11 ) and P the pathspace measure for d3.lt 
with F = 123- The MLE is 

9 = argmax e L(0;z). (3.4) 

As a preliminary to understanding the effect of using multiscale data, we start by ex- 
hibiting an und erlying property of the log-likelihood when confronted with data from 
the model (13.lt itself. The following theorem shows that, in this case: (i) in the limit 
T — > oo the log-likelihood is asymptotically independent of the particular sample path 
of ( 13.lt chosen - it depends only on the invariant measure it; (ii) as a consequence we 
see that, asymptotically, time-ordering of the data is irrelevant to parameter estimation; 
(iii) under some addition al as sumptions, the large T expression also shows that choos- 
ing data from the model ( 13.11 ) leads to the correct estimation of drift parameters, in the 
limit T — > oo. 

Theorem 3.2. Let Assumptions \3.1\ hold and let {-^(i)}te[o,Tl be a sample path of 
J3.lt with 9 — 6q. Then, in L 2 (f2 ) and almost surely with respect to X(0), 

lim o ^L(9;X)= f \F(X;0 o )\ 2 a{x) w(X)dX- f \F(X;9)-F(X;9o)\ 2 a(x) ir(X)dX. 
J x <J x 

This expression is maximized by choosing 9 — 9q, in the limit T — ► oo. 



Proof. By Lemmas IA.2I and IA.3I in the appendix we deduce that, with all limits in 



lim ~L(0;X)= lim (i f (F{X;8), F{X; e )) a(x) dt 



i £(F(X; 9),K(X)dW) a(x) dt - £ \F(X; 9)\ 2 a(x) dt) 

{F{X-9),F{X-9 )) a(X) *(X)dX- \ f \F(X;9)\l (x) w(X)dX. 
x 1 Jx 

Completing the square provides the proof. □ 
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In the particular case where the parameter 9 appears linearly in the drift it can be 
viewed as an W x ' matrix 6 and 

F(X; 9) = QF(X) (3.5) 

The correct value for is thus the M. 1 x 1 identity matrix I. The maximum likelihood 
estimator is 

0(z;T) = A(z;T)- x B{z;T) (3.6) 

where 

A(z; T) = ^J K{z)- l F[z) ® K(z)-*F(z) dt, 

S(z;T) = l f K{z)- l dz®K{z)- l F{z)- 
1 Jo 

if A(z; T) is not invertible then we set Q(z;T) = 0. A result closely related to Theorem 
I3.2l is the followingQ: 

Theorem 3.3. Let Assumptions \3.1\ hold and let {-^(£)}te[o,T] be a sample path of 
(ED with 9 = 9„so that F(X; 9) = F(X). Then 

lim Q(X;T) = I 

1 — >oo 

in probability. 

Proof. We observe that 

B{X;T)=A(X;T) + J 1 

where 

J x = i / ^ ® if (X)- 1 ^^) 
^ Jo 

and where E| Ji | 2 = 0(1/T) by Lemma lA~!2l By ergodicity, and Lemma lA~3l we have 
that 

A{X;T) = A 00 + J 2 

where E| J 2 | 2 = Otl/T) and 4 m is given by (O. By Assumption O and for T 
sufficiently large, A(z; T) is invertible and we have 

0(X;T) = I+(A oc + J 2 )- 1 J 1 
and the result follows. □ 

Remark 3.4. The invertibility of Aoo is necessary in order to be able to successfully 
estimate the drift of the linear system. 

In order to prove an analogue of Theorem 13 . 3 1 when the drift depends nonlinearly 
on the parameter 9 we need to make additional assumptions. 

2 The proof is standard and we outline it only for comparison with the situation in the next subsection 
where data from a multiscale model is employed. 
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Assumptions 3.5. • We assume that 

inf / \F(X;9 + u)-F(X;9 )\ 2 a(x) Tr(X)dX > n(S) > 0, VS > 0. (3.7) 

When ( 13.71 ) holds we will say that the system is identifiable. 

• There exist an a > and F : X — ► R, square integrable with respect to the 
invariant measure, i.e. J x F(X) 2 ir(X)dX < oo, such that 

\F(X; 9) - F(X; 9')\ a[x) < \9 - 9'\ a F(X) (3.8) 

Under the above assumption we can prove convergence of the MLE to the correct 
value 9 . 

Theorem 3.6. Suppose that Assumptions \3.1\ and \3.5\ hold. If, in addition, the param- 
eter space is compact, then 

lim 9(X;T) = 9 

T — >oc 

in probability. 

Proof. It is a straightforward application of the results in |22l . □ 

We now ask whether the likelihood behaves simi larly wh en confronted with data 
{x(t)} from the underlying multiscale systems ( I2.3l l or (12.41 . To address this issue 
we make the following natural assumptions regarding the invariant measure for these 
underlying multiscale systems. 

Assumptions 3.7. • The fast/slow SDE d2.31 > ( resp. d2.41 >) is ersodic with invariant 
measure /i e (dxdy) which is absolutely continuous with respect to the Lebesgue 
measure on X x y with smooth density p e (x, y). 

• The limiting SDE ( 12.7b or ( 12. 9t is ergodic with invariant measure v(dx) which is 
absolutely continuous with respect to the Lebesgue measure on X with smooth 
density 7r(x). 

• The measure /i c (dxdy) — p e {x, y)dxdy converges weakly to the measure ii( dxdy ) = 
Tv(x)p(y; x)dxdy where p(y; x) is the invariant density of the fast proces s (12.11 1 
given in Assumption \2A\ and tt(x) is the invariant density for ( 12.7b (resp. ( 12.91 l). 

• The invariant measure p e (dxdy) — p t (x, y)dxdy satisfies a Poincare inequality 
with a constant independent of e: there exists a constant C p independent of e 
such that for every mean zero H 1 (X x y-, p e (dxdy)) function f we have that 

11/11 < Cpl|V/|| (3.9) 

where V represents the gradient with respect to (x T , y T ) T and || • || denotes the 
L 2 (X x y; p e (dxdy)) norm. 

We also need to assume that the fast/slow SDEs ( 12.3b and ( 12.41 i are uniformly ellip- 
tic. 
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Assumption 3.8. Define the matrix field £ = 77 T where 

Then there is C 7 > 0, independent of e — > such that 

(tZ(x,y)0>C,\t\ 2 V(x,y)EXxy,£eR d . 

Rem ark 3.9. It is straightforwar d to show that, when X — T e , y = T i ~ d , Assu mp- 
tions ^. 7 \ follow from Assumption WM using properties of periodic functions HI 9V , to- 
gether with the compactness of the state space. When X — Mr, y = R^ _ci more work 
is needed in order to prove that the invariant measure satisfies Poincare's inequality 
with an e independent constant, since this, essentially, requires to prove that the gener- 
ator of the fast/slow system has an e-independent spectral gap. In this case where the 
fast/slow system has a gradient structure with a smooth potential V(x, y), then simple 
criteria on the potential have been derived that facilitate determination of whether or 
not the invariant measure satisfies the Poincare inequality. We refer to H24\ [5^ and the 
references therein for more details. 



3.1 Averaging 

We now ask what happens when the MLE for th e ave raged equation ( 13. U is confronted 
with data from the original multiscale equation ( 12.3b . The following result shows that, 
in this case, the estimator will behave well, for large time and small e. Large time is 
always required for convergence of drift p aram eter estimation, even wh en m odel and 
data match. In the limit e — * 0, X(t) from d3.lt approximates x(t) from ( 12.31 ). 



Theorem 3.10. Let Assumptions \2.1\ 15.71 \3.7\ and \3.8\ hold. Let {#(i)}te[o,T] be a 
sample path of ( 12.31 l and {^(i)}te[o,T] a sample path of ( 13. Il l at — 0q. Then the 
following limits, to be interpreted in L 2 (f2) and L 2 (Oq) respectively, and almost surely 
with respect to x(0), y(0), -^(0), are identical: 

lim lim -L(0; x) = lim ~L(0;X). 

e— >0T- *oo T T-^oo T 

Proof. We start by observing that, by Lemma |A~31 and Assumptions 13. 71 

lim lim 1 / \F(x-0)\ 2 a(x) dt = lur if \F(x;0)\ 2 a{x) p^x,y)dxdy 



xxy 



F ( x \ )\a(x) n ( x )p(y'' x)dxdy 



= f \F(x;9)\ 2 aix) n(x)dx, 
where the limits are in I? (O) . Now, from Equation d2.3l ) it follows that 

j, J ( F (x;9),dx) a{x) = ^ J (F(x;e)J 1 (x,y)) a{x) dt 



+ T ( F ( x '^)^ a o{x 1 y)dU) a{x) + — J (F(x;6),a 1 (x,y)dV) a ( x) 
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The last two integrals tend to zero in L 2 (fl) as T — > oo by Lemma lA.21 In order to 
analyze the first integral on the right hand side we consider solution of the Poisson 
equation 



-C o A=(F(x;6)J 1 (x,y)-F(x;0 o )} a{xh / p(y; £)A(y)dy = 0. 

Jy 

This has a unique solution A(y; x) S L 2 (y; x) by construction of i 7 ". 
Applying Ito's formula to A gives 

dA 1 n A 1 - A - A 1 r, A ^ c A dU . dV 

— = -CoA + —^CiA + C 2 A + —=V y A(3— + V x Aa Q — + W x A ai — 
at e y/e v e dt at dt 

which shows that 



-/ {F{x;9),fi(x,y)) a[x) dt=- / (F(x;0),F(x;6 o )) a{x) dt 



T 

1 

T 



o 

4 / (£ 2 A)(xW,t/(t))*~| ; (A(x(T),y(T))-A( a ;(0),y(0)) 



o 

1 " T 





T 

i(V,,Aj9)(a:(t),y(i))d7(t) + (A A) (x(t),y(t))dt 



e(V x Aa )(x(t), y(t))dU(t) + V„Aa 1 )(a:(t) ) y(t))«fV(i) 



r7 

The stochastic integrals tend to zero in L 2 (17) as T — > oo. By assumption A is bounded. 
Furthermore, in L 2 (Q), 

i/ (CiA)(x(t),y(t))dt ^ [ (£iA)(x,y)p(y;x)dy, i = l,2. 
Hence we deduce that 

lira lim — / (F(x;6), f (x,y)) a , x) dt = lim lim ^ / (F(x;9), F(x;6 )) a{x) dt 

e— >0T— »oo J Jq e— >0 T^oo i Jg 

= lim / (F(x;8),F(x;d ))a(x)P e (x,y)dxdy 
{F(x;0),F(x;6 ))n(x)dx. 

i x 

The result follows. □ 

In t he particular case of linear parameter dependence, when the MLE is given 
by (13.61 we have the following result, showing that the MLE recovers the correct an- 
swer from high frequency data compatible with the statistical model in an appropriate 
asymptotic limit. 

Theorem 3.11. Let Assumptions \2J]\3.1\ U77\ and \3.8\ hold. Assume that F(X; 9) is 

given by ( 13.5b . Let {x(t)}t£[o,T\ be a sample path of (12.3b . Then 6 given by 
satisfies 



lim lim Q(x:T) = I 



in probability. 
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Proof. Using equation (12.3b we find that 

B(x; T) = A(x; T) + J 3 + J 4 , where 

i r T 

J 3 = -y K{x)- 1 {h(x 1 y)-F(x))®K(xy 1 F{x)dt, 
1 <- T 



J 4 = - / K(x)- 1 (a (x,y)dU + a 1 (x,y)dV) (E, Kix^Fix). 
1 Jo 

Here, for fixed e > 0, E| J 4 | 2 = 0(1/T) by Lemma|A21and 

lim lim El J 3 | 2 = 

by use of the Poisson equation technique. By ergodicity, and Lemma lA~3l we have that 

A(x;T) =A 00>e + J 5 

where 

{K{x)- 1 F(x) ® Kix^Fix)) p £ (x, y)dxdy, 



xxy 



with 



lim e = A c 

£^0 



and, for fixed e > 0, E| J 5 P = C(l/T). 

Thus by Assumption 13. 1 1 Ajx; T) is invertible for T sufficiently large, and e suffi- 
ciently small, so that 

@{X; T)=I+ (A^ £ + J5)- 1 (J 3 + J 4 ) . 
The result follows. □ 
We would like to show that this also holds for the general case, i.e. if 

6(x;T) := argmaxL(#;x) 



then 



lim lim 9(x;T) ~ Qq, in probability. 

e— >0 T— >oc 



In fact, the following theorem is true for every e > 0. 



Theorem 3.12. Let Assumptions \2.1\ li.il 13.51 \3.7\ and \3.8\hold and assume that 6 £ 
0, a compact set. Let {^(i)}tg[o,T] be a sample path of ( 12.31 ) at 8 = 9 . Assume 
furthermore that that the marginal of the invariant measure of ( 12. 3t on X Tr e (x)dx = 

( ly P e ( x > y)dy^jdx is absolutely continuous with respect to the invariant measure of 

the limiting SDE ir(x)dx. Then, for every e > 0, 



lim 9(x;T) = 9q, in probability. 

T — >oo 
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Proof. Let <7y(o;,0) := yL(0; x) and 
It is straightforward to see that 

arg max £00(6*) = 9 
9 

by completing the square. We apply Lemma lA.41 replacing e b y h, g t by and 
go by goo. The resul t foll ows, provided that cond itions (1A.2I ). ( IA.3b and ( IA.4I ) are 
satis fied. Condition ( IA.2I ) follows f rom Theorem 13.101 The identifiability condi- 
tion dA.4b follows from Assumptions 13.51 and the absolute continuity of Tr e (x)dx = 

( ly P e ( x > y)dy\dx with respect to n(x)dx. Finally, we can verify that ( IA.3I ) holds, 

following the proof in ll22l and using the fact that functions fx, a an d a i are uni- 
formly bounded. □ 



3.2 Homogenization 

We now ask what happens when the MLE for the h omogenized equation (13. Il l is con- 
fronted with data from the multiscale equation ( 12.41 i. which homogenizes to give ( 13. 11 1. 
The situati on d iffers substantially from t he ca se where data is taken from the multiscale 
equations (12.3b which averages to give (13. U : the two likelihoods are not identical in 
the large T limit. 

In order to state the main result of this subsection we need to introduce the Poisson 
equation 

_ A>r = (F(x; 9), f (x, y)) a (x), f p(y; Z)T(y; x)dy = (3.10) 

Jy 

which has a unique solution T(y. x) € L 2 (y; x). Note that 

T = (F(x;9),<f(x,y)) a(x) , 

where <f> solves J2.8I ). Define 

£oo(0)= / (r 1 r(a;,y)-(iJ'(a;;fl),(ri$(a;,i/))) 8 { s) )7r(a;)p( y ;x)da;di/. (3.11) 
Jxxy K ' 

The following theorem shows that the correct limit of the log likelihood is not obtained 
unless Eoo = 0, something which will not be true in general. However in the case 
where /o , 51 = we do obtain E^ — and in this case we recover the averaging 
situation covered in the Theorems 12.31 and Theorem B . 1 01 (with e replaced by e 2 ). 

Theorem 3.13. Let Assumptions \2J]\2.4\ \3J] \3.7\ and \3.8\ hold. Let {x(t)} te t 0tT ] be 

a sample path of (|2.4| l and {^(i)}tg[o,T] a sample path of (13. It at 9 — 9 . Then the 
following limits, to be interpreted in L? (f2) and L 2 (Oo) respectively, and almost surely 
with respect to x(0), y(0), X(0), are identical: 

Urn lim ~L(0;x) = lim ^h(9; X) + E^Q). 
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Proof. As in the averaging case of Theorem l3.10l we have 



1 ' T 



Hm lim - / \F(x;6)\ 2 a(x) dt= \F(x;6)\i (x) ir(x)dx. 

•J « A" 



Now 



where 



1 f T 

- (F{x; 9), dx) a{x) =h+I 2 + h 



1 <- T 



Il = 7rl ( F ( x > es )>f Q ( x > y ^ a ( x ) dt > 



l " T 



h = 7^J { F { x ,0),fl{x,y))a(x)dt, 



1 ' T 



J 3 = — y (F(x;0),a o (af ) i/)tfl7 + a 1 (a;,j/)dV') a ( a! ). 

Now J3 is 0(1/ y /T) in L 2 (fi) by Lemma |A~2l Techniques similar to those used in the 
proof of Theorem l3. lOl show that 

lim lim h—*l (F(x;9), F 1 (x;9 )} a (x)^{dx). 

e-»OT->oo J x 

Now consider I\. Applying Ito's formula to the solution T of the Poisson equation 
( 13.1 01 > , we obtain 



dt e 2 e e dt dt dt 

From this we deduce that 

i £(F(x; 6)J (x, y))dt = ±£ (CxT) dt + h 

where 

lim lim I a = 0. 

Thus 

Il = 7fJ {F ^ x; 6) ' /o(a; ' v))dt = h + h + h 

where, in L 2 (£l), 

h = tJ ( F ( x -> d )>( c ^( x >y)))a( X ) d t, 

h = ^J (Cir(x,y) - (F{x;d), (A*(a:,v))>„(x))<ft. 
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By the methods used in the proof of Theorem l3.10l we deduce that 



lim lim I 5 — > / (F(x; 9), F (x; 9 Q )) a , x )ir(x)dx. 
Putting together all the estimates we deduce that, in L 2 , 

lim lim ^-L(x; 9) = lim L(X; 9) + lim lim J 6 

e— >0 T— >oo 1 T— >oo e->0 T- >oo 

= lim h(X-,e)+E 00 (9). 

T— >oo 



□ 



4 Subsampling 

In the previous section we studied the behavior of estimators when confronted with 
multiscale data. The data is such that, in an appropriate asymptotic limit e — > 0, it 
behaves weakly as if it comes from a single scale equation in the form of the statistical 
model. By considering the behavior of continuous time estimators in the limit of large 
time, followed by taking e — > 0, we studied the behavior of estimators which do not 
subsample the data. We showed that in the averaging set-up this did not cause a problem 
- the likelihood behaves as if confronted with data from the statistical model itself; but 
in the homogenization set-up the likelihood function was asymptotically biased for 
large time. In this section we show that subsampling the data can overcome this issue, 
provided the subsampling rate is chosen appropriately. 

In the following we use W to denote expectation on X with respect to measure 
with density ir and W to denote expe ctati on on X x y with respect to measure with 
density p e . Recall that, by Assumption l3.7l the latter measure has weak limit with den- 
sity Tr(x)p(y; x). Let £1' = fl x X x y and consider the probability measure induced 
on paths x, y solving ( 12. 4t by choosing initial conditions distributed according to the 
measure Tr(x)p(y; x)dxdy. With expectation E under this measure we will also use the 
notation 

IMI P := (E\-\n 1/P . 

We define the discrete log likelihood function found from applying the l ikelihood 
principle to the Euler-Marayama approximation of the statistical model ( 13. 11 1. Let z = 
{z n }n=o denote a time series in X. We obtain the likelihood 



JV-l 



L' 



6.N 



l^ 1 



z) = ^2 (F(z n ;9),z n+ i - z n ) a[Zn) - -^ \F(z, 



2 ^ 

n=0 



J. 



Let x„ = x(nS), noting that x(t) depends on e, and set x — {x n } n ^j L . The basic 
theorem in this section proves convergence of the log likelihood function, provided 
that we subsample (i.e. choose 6) at an appropriate e-dependent rate. We state and 
prove the theorem, relying on a pair of intuitively reasonable propositions which we 
then prove at the end of the section. 

Theorem 4.1. Let Assumptions \2J] \2.4\ 1X71 \3.7\ and \3.8\ hold. Let {x(i)}te[o,T] be 

a sample path of (12.4b and X(t) a sample path of ( 13. Il l at 9 = 9q. Let S — e a with 
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a £ (0,1) and let N = [e 7 ] with 7 > a. Then the following limits, to be interpreted in 
L 2 {Q,') and L 2 (flo) respectively, and almost surely with respectto X(0), are identical: 

lim^h N ' s (e;x)= lim ±L(0;X). (4.1) 

The proof of this theorem is based on the following two technical results, whose 
proofs are presented in the appendix. 

Prop ositi on 4.2. Let (x(t), y(t)) be the solution of ( |2.4| i and assume that Assump- 
tions 12. l \ and \2.4\ hold. Then, for e, 5 sufficiently small, the increment of the process 
x(t) can be written in the form 

x n +i - x n = F(x n ;0 o ) 5 + M n + R(e, 5), (4.2) 

where M n denotes the martingale term 

r(n+l)S r(n+l)6 

M n = (V y <f>[3 + a )(x(s) : y( S ))dV + a 1 (x(s),y(s)) dU 

J nS J nS 

with \\M n \\ p < Cy^S and 

\\R(e,8)\\ p <C(6^ 2 + e8^ +e). 



Proposition 4.3. Let g g C ( X) and let Assumptions\3J\hold. Assume that e and N 
are related as in Theorem \4.1\ Then 

1 N ^ 

where the convergence is in L 2 with respect to the measure on initial conditions with 
density w(x)p(y; x). 

Proof of Theorem [OJ We define 

N-l 

h{x,6)= (F(x n ; 9), x n+ x - x n ) a ( Xn ) 

n=Q 

and 

N-l 

h{x) = ^Y,\ F ^ 6 )\\^- 

n=0 

By Proposition l4.3l we have that 

^Hx)^\j x \F{x-Ml { *)<dx). 
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We use Proposition l4.2l to deduce that 

1 1 N ^ 

m h(x;0) = -=Y,(F(x n ;6),F(x n ;9 )6 + M n + R(e,6)) a(x 



NS iy ' ' NS 

n=0 



) 

N-l , N-l 



lj2( F ( x n,0),F(x n -,d )) a(xn) +^J2< F ^ M ^ 



la(x n ) 
| N-l 

+ NS D^")'^ 6 ' *)>«(*») 

n=0 

= : Ji + J 2 + J 3 - 
Again using Proposition l4.3l we have that 



n=0 n=0 



Ji- / (F(z;0),,F(z;0o)>«(*) 



X 

Furthermore, using the fact that M n is independent of x n and has quadratic variation 
of order 5 it follows that 

1 N ~ 1 

ra=0 

- AT<T 

Here Q is defined to obtain the correct quadratic variation of the M n . Consequently, 
and since 7 > a, 

HJ2II2 < o(l) 

as e — > 0. Similarly, using martingale moment inequalities [10J Eq. (3.25) p. 163] we 
obtain 

II M\p < o(l). 

Finally, again using Proposition |4.2| we have, for q^ 1 + p^ 1 = 1, 

1 N ~ 1 1 

II J 3|Ip < ]^ £ IIWIIJ^MIIp < N^ + e + eS 1 / 2 

< 0(1), 

as e — > 0, since we have assumed that a E (0, 1). 
We thus have 

1 „ N S ,„ , /" „, „ >, / x , 1 



lim— L"> d (0;z) = J (F(x;6),F(x;6 )) a{x) ir(x)dx-- \F(x;9)\ 2 a{x) n(x)dx. 



By completing the square we obtain (14. It . □ 

As before, we would like to use this theorem in order to prove the consistency 
of our estimator. The theory developed in ll22ll no longer applies because it is based 
on the assumption that the function we are maximizing (i.e. the log likelihood func- 
tion) is a continuous semimartingale, which is not true for the discrete semimartingale 
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C N ' S (9; x). The most difficult part in proving c onsis tency is to prov e that the mar- 
tingale converges uniformly to zero (Assumption IA.3I in Lemma |A.4t . To avoid this 
difficulty, we make some extra assumptions that allow us to get rid of the martingale 
part: 

Assumptions 4.4. 1. There exists a function V : X x — > M such that for each 
9 £Q,V(-,0) eC 3 (X)and 

VV(z;0) = (K(z)K(z) T y 1 F{z;6), Vz eX,9& 6. (4.4) 

2. Define G : X x 9 — > E as follows: 

G(z;9) :=D 2 V(z;9) : {K(z)K(z) T ), 

where D 2 V denotes the Hessian matrix ofV. Then there exist an f3 > and 
G : X —> M f/iflf is square integrable with respect to the invariant measure, such 
that 

\G(z;9)-G(z;6')\ < \9 - 9'\ f3 G(z). 

Suppose that the above assumption is true and {X(t)}t^[o,T\ is a sample path of 
d3.lt . Then, if we apply Ito's formula to function V, we get that for every 9 G O: 

dV{X{t)-9) = (VV(X(t);9),dX(t)) + ^G{X{t):9)dt. 

But from ( 14.41 i we have that 

(VV(X(t);9),dX(t)) = ((K(X(t))K(X(t)fy 1 F(X(t);9),dX(t)) = 
= (F(X(t):9),dX(t)} aix{t}} 

and thus 

(F(X(t);9),dX(t)) a(x{t)) =dV(X(t))- ~G(X(t);9)dt. 
Using this identity, we can write the log-likelihood function d3.3l l in the form 

L(9;X(t)) = (V(X(T);9)- V(X(0);9))-± £ (\F(X(t); 6)\l {x{t)) + G(X(t); d)) 
Using this version of the log-likelihood function , we define 

1 N ~ 1 

l N ' S (9;z)^--Y,(\F(z n ;9)\l {Zn) +G(z n ;9))s. (4.5) 
n=0 

Now we can prove asymptotic consistency of the MLE, provided that we subsample at 
the appropriate sampling rate. 

Theorem 4.5. Let Assumptions ^^ |Z4] Wl\ [331 U71 \l8\ anc 

9 G O, a compact set. Let {^(i)}tG[o,T] be a sample path of 

6{x;e) := a,rgmaxL N < s (9;x) 

9 

with N and 5 as in Theorem \4. l\ above and L. N ' S (9; x) defined in ( |4.5t . Then, 
lim 9(x; e) = 9q, in probability. 



4.4 



2.4 1 



hold and assume that 
at 9 = 9q. Define 
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Proof. We apply Lemma lA~4l with g e (x, 9) 
that 

by Proposition 14. 3 1 and the fact that 

lim ±(V(X(T);9)-V(X(0);9))=0, 

1 — too _L 

which follows from the ergodicity of X. As in Theorem l4.ll the limits are interpreted 
in L 2 (Vl') and L 2 (flo) respectively, and almost surely with respect to X (0). As we 
have already seen , the maximizer of g g(9) is 9q. So, Assumption iA.2i is satisfied. 
Also, Assumption 13. 5 1 is equivalent to ( IA.4b . To prove consistency, we need to prove 
( IA.3b . which can be viewed as uniform ergo dicit y. T he p roof is again similar to that 
in [22 1 . First, we note that by Assumptions 13.51 and 14.41 both g t (-,9) and go(9) are 
continuous with respect to 9, so it is sufficient to prove (IA.3b on a countable dense 
subset 9* of 0. Then, uniform ergodicity follows from [5 Thm. 6.1.5] , provided that 

N i] ( e ^: I! • IUm*)) < °°> 

i.e. the number of balls of radius e with respect to || ■ Hlhv) needed to cover 

T:={\F(z;9\l {z) +G(z;9):9ee*} 

is finite. As demonstrated in ll22ll . this follows from the Holder continuity of \F(z; 9)\ 2 ^ z 
and G{z;9). □ 



= j^C N ' 5 (9;x) and g o {0) its limit. Note 



lim -L(9;X) 



5 Examples 

Numerical experiments, illustrating the phenomena studied in this paper, can be found 
in the paper [ 19 1. The experiments therein are concerned with a particular case of the 
general homogenization framework considered in this paper and illustrate the failure 
of the MLE when the data is sampled too frequently, and the role of subsampling to 
ameliorate this problem. In this section we construct two examples which identify the 
term responsible for the failure of the MLE. 

5.1 Langevin Equation in the High Friction Limit 

We consider the Langevin equation in the high friction limit0 

where V(q; 9) is a smooth confining potential depending on a parameter Oe9c K^Q 
(3 stands for the inverse temperature and W(t) is standard Brownian motion on M. d . We 

3 We have rescaled the equation in such a way that we actually consider the small mass, rather than the 
high friction limit. In the case where the mass and the friction are scalar quantities the two scaling limits are 
equivalent. 

4 A standard example is that of a quadratic potential V(q;9) = ^q8q T where the parameters to be 
estimated from time series are the elements of the stiffness matrix 8. 
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write this equation as a first order system 



dq 1 dp 1 1 2/3"i# 

In the notation of the general homogenization set-up we have (x, y) = {q,p) and 
fo = P, fx = 0, a Q = 0, ai = 

and 

The fast process is simply an Ornstein-Uhlenbeck process with generator 

C = -p-Vp + ZT 1 ^. 

The unique square integrable (wi th re spect to the invariant measure of the OU process) 
solution of the Poisson equation ( 12. 8t is $ = p. Therefore, 



F o = -V g F(<z;0), Fi = 0, A 1 =V2F T I. 
Hence the homogenized equation ifl 

f = -V^) + v^f. (53) 

Consider now the para meter estimation problem for " full d ynamics" d5.lt and the 
"coarse g raine d" model (15. 3t: We are given data from J5.lt and we want to fit it to 
equation ( 15. 3t . Theorem 13.131 implies that for this problem the maximum likelihood 
estimator is asymptotically biasedQ In fact, in this c ase we can compute the term Eoo, 
responsible for the bias and given in equation ( 13.1 It . We have the following result. 

Proposition 5.1. Assume that the potential V(q;0) e C°° (R d ) is such that e ~ l3V< -' 1 ' e) £ 
£ 1 (R rf ) fo r every (3 > and all 9 G 0. Then error term E^, eqn. (13. Ill ) for the 
SDE ( 15.lt is given by the formula 

E oo {0) = -Zy 1 l- f \W q V(q;e)\ 2 e-^ e Uq, (5.4) 

where Zy = J Rd e~^ y ( ?;e ) dq. In particular, Eoo < 0. 
Proof. We have that 

Ci =p- V,- V q V- V„. 
The invariant measure of the process is e-independent and we write it is 

p{q, P ; 9) dqdp = Z^e'^'^ dqdp. 



Furthermore, since the homogenized diffusion matrix is \/2(3 1 7, 



5 In this case we can actually prove strong convergence of q(t) to X(t) 1 1 211 1 81 - 

6 Subsampling, at the rate given in Theorem l4. 1 1 is necessary for the correct estimation of the parameters 
in the drift of the homogenized equation )5.3t . 



24 



where (•, •) stands for the standard Euclidean inner product. We readily check that 

|Ar = Ci(-V q V,p) =- P ®p: D 2 q V(q;0) + \V q V(q;0)\ 2 



|(F,A$) a = (-V q V,CiP) = \V q V(q;9)\ 2 . 



[ p®p: D 2 V(q;6)Z- 1 e~' 3H( - p ^ dqdp 

JWL 2d 



and 
Thus, 

EUe) = -f 

= -\f A q V(q;e)Z^e-^ s Uq=-U \V q V(q;8)\ 2 Z^e-^e) 
which is precisely d5.4t . □ 

5.2 Motion in a Multiscale Potential 

Consider the equation |[T9l 

dx . . , / 7-dW 

- = -VH,) + ^ ¥ (5.5) 

where 

V e {x) = V(x) +p(x/e), 

where the fluctuating part of the potential p(-) is taken to be a smooth 1-periodic func- 
tion. 

Setting y = x/e we obtain 

f = - (W(o0 + | Vp(y)) + V^F 1 ^ (5.6a) 
d JL = _l( V y(x) + iVp(y))+iy2r T ^. (5.6b) 

In the notation of the general homogenization set-up we have 

fo = .9o = -V y p(y), fl = 91 = -W(x) 

and 

a = 0, ax = (3 = y/W 1 - 
The fast process has generator 

C a = -V y p(y)-V y + P^Ay. 

The invariant density is p(y) = Z^ 1 cxp(— f3p(y)) with Z p = J Td exp(— @p{y)) dy. 
The Poisson equation for $ is 

£ <f>(y) = V y p{y). 
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Notice that $ is a function of y only. The homogenized equation is 

= -kvv(x) + v^F^r w 

at at 

where 



K= / (I + V„$(y))(/ + V J/ $(y)) i p(y)dy. 

Jf 

Suppose now that the potential contains parameters, V = V(x, 8), Se0C K^. We 
want to estimate the parameter 8, given data from (15.5b and using the homogenized 
equation 

Theorem 13.131 implies that, for this problem, the maximum likelihood estimator is 
asymptotically biased and that subsampling at the appropriate rate is necessary for 
the accurate estimation of the parameter 8. As in the example presented in the previous 
section, we can calculate explicitly the error term E^. For simplicity we will consider 
the problem in one dimension. 

Proposition 5.2. Assume that the potential V{x; 8) G C°° (M) is such that e -0 v ( x ' e ) g 
£ 1 (R ) for every (3 > and all 8 € 0. Then error term Eoo, eqn. ( 13. lit for the 
SDE ( 15. 5b z'i g/ven fey the formula 



E\(") = ( + / |9^| 2 e-TO)^. (5.8) 

w/iere = J R e - fw ^ dq, Z p = e -^{v) dy Z p = dy. /« particular, 

Eoo < 0. 

Proof. Equations ( I5.61 > in one dimension become 

x = -d x V(x;8)-^d yP (y) + ^/2jF I W, (5.9a) 

1 1 2/3 -1 • 

y = --d x V(x;8)--d y p{y) + A^W. (5.9b) 

The invariant measure of this system is (notice that it is independent of e) 

p(y,x;9) dxdy = Zy 1 (8)Z p ' 1 e- f3V( - x ' e) '' 3p{v) dxdy. 
The homogenized equation is 



X = -Kd x V{x- 8) + y/2(3- 1 KW. 

The cell problem is 

£ o = d y p 

and the homogenized coefficient is 



26 



We have that 

The error in the likelihood is 

Eoo{9) = J J (£iT(x,y) - (F,£ 1 (j)) a{x) S jp(x,y)dydx } 

where 

r = ( F , 4>)<x{x), 

F = -Kd x V. 

We have that 

T(x,y) = JLi-KOxVcji) = ~^8 x Vcb. 

Furthermore 

C\ = -d x Vdy - dypd x + 2/3~ 1 d x d y . 

Consequently 

dT(x, y) = | {\d x V\ 2 d y 4> + d y pd 2 x Vcj> - 2p- 1 d 2 x Vd y ^) . 
In addition, 

{F,C x <l>) a{x) = ^\d x V\ 2 d v <p. 
The error in the likelihood is 

£U0) = f J J (-d yP d 2 x Vcf > + 2f3- 1 d 2 x d v ^)Zy 1 Z ] ; 1 e- f3V ^-^dxdy 
= -hL^l J d 2 Ve -f3VW) faj 1 dy( p e -My) dy 
+Zy 1 Z~ 1 J dlVe-^^ dxj' d y 4>e-0i>M dy 
= Zyl ^ J fiVe-m*-'*) dx I' d y ct>e-^ dy 

= ^f- i \^ V \ 2e ~ mX '' 9) dx(-l + Z- 12 ? 1 ) ■ 

In above derivation we used various integrations by parts, together with the formula for 
the derivative of the solution of the Poisson equation d y <p = — 1 + Z~ 1 eP p ( y \ [20, p. 
213]. The fact that Eoo is nonpositive follows from the inequality Z^Z^ 1 < 1 (for 
p(y) not identically equal to 0), which follows from the Cauchy-Schwarz inequality. 

□ 

Remark 5.3. An application of Laplace' s method shows that, for j3 3> 1, Z~ X Z~ X ~ 
e~ 2 1 
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6 Conclusions 



The problem of parameter estimation for fast/slow systems of SDEs which admit a 
coarse-grained description in terms of an SDE for the slow variable was studied in 
this paper. It was shown that, when applied to the averaging problem, the maximum 
likelihood estimator (MLE) is asymptotically unbiased and we can use it to estimate 
accurately the parameters in the drift coefficient of the coarse-grained model using data 
from the slow variable in the fast/slow system. On the contrary, the MLE is asymptoti- 
cally biased when applied to the homogenization problem and a systematic asymptotic 
error appears in the log-likelihood function, in the long time/infinite scale separation 
limit. The MLE can lead to the correct estimation of the parameters in the drift co- 
efficient of the homogenized equation provided that we subsample the data from the 
fast/slow system at the appropriate sampling rate. 

The averaging/homogenization systems of SDEs that we consider in this paper are 
of quite general form and have been studied quite extensively in the last several decades 
since they appear in various applications, e.g. molecular dynamics, chemical kinetics, 
mathematical finance, atmosphere/ocean science-see the references in ||20l . Thus, we 
believe that our results show that great care has to be taken when using maximum 
likelihood in order to infer information about parameters in stochastic systems with 
multiple characteristic time scales. 

There are various problems, both of theoretical and of applied interest, that remain 
open and that we plan to address in future work. We list some of them below. 

• Bayesian techniques for parameter estimation of multiscale diffusion processes. 

• The development of efficient algorithms for estimating the parameters in the 
coarse-grained model of a fast/slow stochastic system. Based on the work that 
has been done to similar models in the context of econometrics |fl3l |2) one ex- 
pects that such an algorithm would involve the estimation of an appropriate mea- 
sure of scale separation e, and of the optimal sampling rate, averaging over all 
the available data and a bias reduction step. 

• Investigate whether there is any advantage in using random sampling rates. 

• Investigate similar issues for deterministic fast/slow systems of differential equa- 
tions. 
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A Appendix 

A.l An Ergodic Theorem with Convergence Rates 

Consider the SDE 

_ = fc W+7W _ J (Al) 

with z G Z, where Z is either R k or T fe , h : Z -> R k , 7 : Z -> R kx P and w G R p 
a standard Brownian motion. Assume that h, 7 are C°° with bounded derivatives. Let 
ip : Z — > R be b ound ed, and <j) : Z — ► R be bounded. We denote the generator of the 
Markov process (IA.U by A. 
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Assumptions A.l. The equation dA.U is ergodic with invariant measure v(z)dz. Let 

cf> — I <j)(z)v{z)dz. 



z 



Then the equation 

-A§ = 4>-4>, J ®{z)v(z)dz = 

has a unique solution $ : Z — > M, w;f/i $ one/ V$ bounded. 
Lemma A.2. Lef 



/= -4= / ^(«(i))dW(t). 



77;en f/zere ejewfi a constant C > 0: K\I\ 2 < C for all T > 0. 

Proof. Use the Ito isometry and invoke the boundedness of tp. □ 

Lemma A.3. 7Ime averages converge to their mean value almost surely. Furthermore 
there is a constant C > 0: 



E 



T 

cj){z{t))dt - (p 







2 

_ J, 



Proof. By applying the Ito formula to $ we obtain 



■A$(z(*))d* = $(2(0)) - $0(T)) + / (V$ 7 ) (z(*))dW( t ). 

Jo 



Thus 



J=^= [ T (X7$j)(z(t))dW(t). 
Vl Jo 



The result concernin g £ 2 ( £}) convergence follows from boundedness of $, V<£> and 7, 
together with Lemma|A2] Almost sure convergence follows from the ergodic theorem. 

□ 

A.2 Consistency of the Estimators 

Lemma A.4. Let (O, P) foe a probability space and g e : £1 x 9 — > R, 50 : © - * K 
foe 5mc/z f/zaf 

V# € 6, g e — > go in probability, as e — > (A.2) 

and 



V<5, re > : P < w : sup (s £ (w, O + u) - 3o(#o + ") ) > « 

I |u|><5 ^ 



0, as e — > 0, 

(A.3) 
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where 

§o = arg sup g (6). 
flee 

Moreover, we assume that 



WS > 0, sup (go0a + u) - g (§ )) < -k(5) < 0. (A.4) 

|u|><5 v ' 

v 

9 e {ui) = arg sup g e {u,0) 
eee 

then 

9 e — > 6*o in probability. 

Proof. First note that V<5 > 

p{|0 e -0 o | >S} <p| sup^g e (uj,§ Q + u)-g e (u,9 )^ >o|. (A.5) 
We define 

G e (uj; 9, u) := g e (oj, 9 + u) - g e (cu, 9) and G (6>, u) := g Q {9 + u) - g (9). 
Clearly, 

sup G e (uj;9 ,u) < sup [G e (uj; § Q , u) - G {9 , u) ) + sup G (9 ,u) 

\u\>6 \u\>5 K ' \u\>& 

and thus 



sup G e (uj;9 ,u)>o\ < P J sup (g,(lu; 9 , u) - G O (0 O , uj) > - sup G O (0 O , u) \ 

\u\>S J [ \u\>6 V ' \u\>& J 

< pi sup (G e (u]6 ,u)-G (9 ,uf) >k(S) >0 I (A.6) 



|u|><5 

by Assumption dA.4t , Note that 

G £ (w;^ ,w)-Go(^o,u) - (g e (o>] &o + «) - 9o0o + «)) - ($ e (w; 4) - 0o(0o)) • 

So, by conditioning on |w : |p e (w; #o) ~ 5o(^o)| > 5 K (<5)} ar, d < IA.5b and ( IA.6b . we 
get that 

P {\9 e - 0„ | > 5} < P { sup M>4 O + u) - . 9o (0o + u)) > > } 

+P{ |<fcM„) - flb$>)| > |k(5)>0} 

Both probabilities on the right-hand-side go to zero as e — > 0, by assumptions d A.3I > 
and dA.2t respectively. We conclude that 9 e — > #o in probability. □ 
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A.3 Proof of Propositions HI and |43] 

In this section we present the proofs of Propositions |4.2| and |4.3| which we repeat there, 
for the reader's convenience. 

Prop ositi on A.5. Let (x(t),y(t)) be the solution of (12.41 ) and assume that Assump- 
tions 12. i l and \2.4\ hold. Then, for e, S sufficiently small, the increment of the process 
x(t) can be written in the form 

x n +i ~x n = F(x n ; 9 Q ) S + M n + R(e, 6), 

where M n denotes the martingale term 

r(n+l)S An+l)S 

M n = / _ (V y $/3 + oo) (x(s), y(s)) dV(s) + / _ a 1 (x(s),y(s)) dU(s) 

J nS Jn5 

with ||M„|| p < CV$ and 

\\R(e,5)\\ p <C(S 3 / 2 + e5i+e). 

Proposition A.6. Let g g C (X) and let Assumptions \377\ hold. Assume that e and N 
are related as in Theorem\4.1\ Then 



1 W-l 



n=0 

-2 



where the convergence is in L with respect to the measure on initial conditions with 
density ir(x)p(y; x). 



For the proofs of Propositions lA.5l and lA.6l both used in the proof of Theorem l4.ll 
we will need the following two technical lemmas. We start with a rough estimate on 
the increments of the process x(t). 

Le mma A.7. Let (x(t), y(t)) be the solution of (12.41 ) and assume that Assumptions \2.1\ 
and \2.4\ hold. Let s G [nS, (n + 1)5]. Then, for e, S sufficiently small, the following 
estimate holds: 

||a;(*)-a;n||p<C(e + **). (A.7) 
Proof. We apply Ito's formula to $, the solution of the Poisson equation ( 12.8b . to obtain 

x(s)-x n = -e(®(x(s),y(s)) -$(x n ,y n ))+ [ (A$ + A)) (x{s),y{s)) ds 



J nS 

+ 1 (V y $0 + a o )(x(s),y(s))dV(s) + I a 1 (x(s),y(s)) dU(s) 



v - 

s 

+e f (jC 2 $)(x{s),y{s))ds + e I (V a $a ) (x(s), y{s)) dU(s) 
+e f {V x ^a 1 ){x(s),y(s))dV(s) 

JnS 

= : J1 + J2 + J3 + J4 + J5 + J6 + Ji- 
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Our assumptions on Q(x, y), together with standard inequalities, imply that 

\\Ji\\ P <Ce, ||J 2 || P <C<5, \\J 3 \\ P <C6*, 

\\Ja\\ p <CS^, || J 5 || p < Ce6, \\ J 6 \\ p < CeS 1/2 , \\ J 7 \\ p < CeS 1 / 2 . 



Estimate ( IA.7I ) follows from these estimates. □ 
Using this lemma we can prove the following estimate. 

Le mma A.8. Let h(x, y) be a smoot h, bo unded function, let (x(t), y(t)) be the solution 
of (12.41 l and assume that Assumption \2J\ holds. Define 

H(x) := / h(x,y) p{y;x)dy. 



Then, for e, S sufficiently small, the following estimate holds: 

,(n+l)<5 



h(x(s), y(s)) ds = H(x n ) S + R(e, 5) (A.8) 

nS 

where 

\\R(e,5)\\ p < C(e 2 + S 3 / 2 + eS 1 / 2 ). 
Proof. Let <f> be the mean zero solution of the equation 

-£ o = h{x,y)-H(x). (A.9) 



By Assumption 12 . 1 1 this solution is smooth in both x, y and it is unique and bounded. 
We apply Ito's formula to obtain 

(n+l)<5 

(h(x(s),y(s))) - H(x(s))) ds = -e 2 (<p(x n+1 , y n+1 ) - <j>{x n , y n )) 

s 

r(n+l)S 

+e / Ci(j)(x{s),y{s))ds 

JnS 

r(n+X)6 

+e 2 / £ 2 4>{x(s),y(s))ds 

JnS 

+e 2 / (V x <t>a Q )(x(s),y(s))dU(s) 

JnS 

An+l)S 

+e / {V y <j)f3 + eV x (l)a 1 )(x(s),y(s))dV(s) 

JnS 

= : Ji + J 2 + J 3 + J4 + J5- 



Our assumptions on the solution <j> of the Poisson equation JA.9t , together with stan- 
dard estimates for the moments of stochastic integrals and Holder's inequality give the 
estimates 

||Ji|| P <Ce 2 , \\J 2 \\ P <CeS, || J 3 || p < Ce 2 S, 
||J 4 || P <Ce 2 <5 1 / 2 , ||J B || P < Ce* 1 / 2 . 
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The above estimates imply that 

-(n+l)<5 
InS 

with 



r(n+i)o n(n+l)S 

/ h[x(s),y(s))ds = H(x(s)) ds + R x (e,6) 

J nS J nS 



\\Ri(e,S)\\ p <c[e6^ 2 + e 2 y 
We use the Holder inequality and the Lipschitz continuity of H (x) to estimate: 



01+1)5 



H(x(s))ds- H(x n )S 



lid 



(n+l)<5 



(H(x{s))-H{x n )) ds 



n 6 

< C5P- 1 



(n+l)S 



nS 



\\H{x(s)) - H(x n )\\ p v ds 



(n+l)8 



ii 6 



\\x(s) - x n \\P ds 



< 



where Lemma lA.71 was used and i?2(e, 5) = (e8 + <5 3 / 2 ). We combine the above 
estimates to obtain 



(n+l)S 



h{x(s) 7 y(s)))ds 



nS 



(n+l)5 



H(x(s))ds + Rx(e,S) 



nS 



H(x n )5 + R 1 (e,S)+R 2 (e,S), 



from which dA.8t follows 



□ 



Proof of Pro posit ion \4. 21 ( Proposition IA.5D . This follows from the first line of the 
proof of Lemma I A. 71 the estim ates therein concerning all the Ji with the exception of 
J2, and the use of Lemma |A.8| to estimate J2 in terms of 5F(x n ; 9q). □ 

Proof of Proposition \4.3\ (Proposition \A.6]) . We have 



N-l f ( n +l)S 



^ N-l ^ N-l . 

n E a(* n ) = m E J 

n=0 n=0 Jn 



g(x n )ds 



n=0 " nS 
N-l „(n+l)S 



N-l 



77 yZ / 9(x{s)) ds + — Y] / (g(x n ) - g(x(s))) ds 

j <-N6 j N-l p( n +l)S 

Jo + — E J s (g( x n) - g(x(s))) ds 



N5 
= : h+Ri 
We introduce the notation 

fn • 



{n+l)8 



(g(x n ) ~ g(x(s))) ds. 
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By Lemma IA771 we have that x(s) — x n = 0(e + 5 s ) in L p (fl'). We use this, together 
with the Lipschitz continuity of g and Holder's inequality, to estimate: 

\\fn\\l < S p,q / EU(0- g(x(s)) \ p ds 



< C5 1+p/q (e p + 5 p/2 ). 
Here p^ 1 + q^ 1 = 1. Using this we can estimate R\ using: 
1 iV_1 1 

ll^llp ^ ^Eii/»ii^ c ^ (1/p+1/9) (^ + ^ 1/2 ) = c( e + ^)^ 0l 

ri=0 

as e — > 0. 

Thus it remains to estimate I\, Let T = Af<5. Let ^ e solve 

- £ /lom V e (^, y) = 9{x) := g(x) - W'g. (A.10) 
Apply Ito's formula. This gives 

i f g(x(s))ds - W*g = - i (# (x(T), y(T)) - ^ (*(()), y(0)) 



1 ' T 



+ ^ / {V x ra)(x(s),y(s))dU'(s), 
=: Ji + J 2 

where J 2 denotes the two stochastic integrals and we write adU' — a^dU + aidV, in 
law. Note that 

W'g -> Wg 



as e — > by Assumptions 13.71 Thus the theorem will be proved if we can show that 
Ji + J 2 tends to zero in the required topology on the initial conditions. Note that 

W\J,\ 2 <^W'm\ 

Here £ is defined in Assumptions l3.8l and V is the gradient with respect to (x T ,y T ) T . 
We note that, by stationarity, we have that 

E p5 |v/| 2 = E pe (vv/,svv/) = (VV> e ,£VV/), (A.ll) 

where || ■ || and (•, •) denote the L 2 (X x y>; ^(dxdy)) norm and inner product, respec- 
tively. 

Use of the Dirichlet form (see Theorem 6. 12 in [20 1) shows that 



(V^EV^) <2 J g(x)r(x,y)p e (x,y)dxdy 
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for any a > 0. Using the Poincare inequality d3.9t , together with Assumptions 13 .71 and 
13.81 gives 

WW 2 < C P 2 ||VV/|| 2 < aC- l C 2 p \\g\\ 2 + a^C"^^! 2 . 
Choosing a so that a~ 1 C~ 1 C 2 = \ gives 

||^|| 2 <CE^|3| 2 . 

Hence 

<CE" e |g| 2 , 

where the notation introduced in dA. lib was used. The constant C in the above in- 
equalities is independent of e. Thus 

E P Vi| 2 +^ pC \J 2 \ 2 < ^CE pC \g\ 2 . (A.12) 

Since the measure with density p c converges to the measure with density 7r(x)p(y; x) 
the desired result follows. □ 
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